Computational Metalexicography in Practice - Corpus-based support for the . . .

نویسندگان

  • Vincent J. Docherty
  • Ulrich Heid
چکیده

Computational Metalexicography in Practice { Corpus-based support for the revision of a commercial dictionary Abstract In a cooperation between dictionary publishers and computational linguists, raw material for the revision of the German part of a bilingual German ! English dictionary (Langenscheidts Handww orterbuch Englisch, Neubearbeitung 1991) was produced. In a case study, the entries for headwords with the initial letter \p", then, { between August 1997 and March 1998 { the full dictionary were systematically checked against a 300 million word German newspaper corpus from the late 80s and early 90s. The objective was to nd evidence to support updates of the lemma inventory of the dictionary and to enhance the example and collocation coverage. The data production from the corpora is automatic, the (manual, interactive) lexicographic procedures remain unchanged. To this end, standard corpus pre-processing (tokenizing, tagging, lemmatization) and a hierarchical set of query templates for collocation extraction were used. The dictionary was transformed into a speciic data format (similar to database entries), and the examples contained in the articles were prepared for automatic querying. The results are of metalexicographic interest: they show the potential of reened macrostructural selection procedures, help to improve the documentation of readings through examples, and, generally, provide an example of the use of standard computational linguistic techniques for dictionary revision. The auxiliary resources constructed from the corpora in the same process { a verb frequency lexicon for German and a collection of noun-verb collocation candidates are useful and relevant in their own right. Similarly, the tools used are mostly generic and thus reusable outside the speciic context discussed here.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Gearing the Discursive Practice to the Evolution of Discipline: Diachronic Corpus Analysis of Stance Markers in Research Articles’ Methodology Section

Despite widespread interest and research among applied linguists to explore metadiscourse use, very little is known of how metadiscourse resources have evolved over time in response to the historically developing practices of academic communities. Motivated by such an ambition, the current research drew on a corpus of 874315 words taken from three leading journals of applied linguistics in orde...

متن کامل

Tilt Table Practice Improved Ventilation in a Patient with Prolonged Artificial Ventilation Support in Intensive Care Unit

Patients who are on prolonged ventilator support in critical care unit present wide variety of complications, which range from reduction in oxygen uptake to various musculoskeletal impair-ments. Early mobilization and rehabilitation are encouraged to manage these complications effectively. Use of tilt table to motivate early mobilization in the intensive care unit for ventilator practices is no...

متن کامل

I-31: New Approaches for Luteal Phase Support in ART Cycles

Background During a normal menstrual cycle, progesterone prepares the endometrium for pregnancy by stimulating proliferation in response to human chorionic Gnadotropin produced by the corpus luteum. Many questions were raised about the role of follicular fluid aspiration on the granuloma cells at the time of oocyte retrieving during the ART cycles. Authors believes that oocyte retrieval might d...

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998